Resilience to Device Driver Failures Using Virtualization

نویسندگان

  • Michael Le
  • Yuval Tamir
چکیده

Faulty device drivers are a significant cause of system failures. Low overhead mechanisms that leverage virtualization can detect and recover from device driver failures without requiring modifications to the device driver, applications, or OS running in the VMs. These mechanisms can var y significantly in terms of coverage, recovery latency, and implementation complexity. This paper explores the design space of such mechanisms, provides a taxonomy for their character ization, and evaluates key points in the design space. Based on full implementations of a variety of mechanisms, design tradeoffs are described and key implementation challenges are identified. Schemes are evaluated on a var iety of system configurations with multiple devices and multiple VMs running applications. Extensive fault injection campaigns are used to evaluate the effectiveness of the different mechanisms. It is shown that simple recovery schemes, transparent to the VMs running applications, can effectively recover from a ver y high percentage of faults. How ever, in order to minimize ser vice interr uption duration, it is necessary to use schemes that are slightly more complex, involving redundant device controllers. Index Terms — Fault tolerance, recovery, vir tual machine, VMM, hyper visor, networ k, storage

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Quest-V: A Virtualized Multikernel for High-Confidence Systems

This paper outlines the design of ‘Quest-V’, which is implemented as a collection of separate kernels operating together as a distributed system on a chip. Quest-V uses virtualization techniques to isolate kernels and prevent local faults from affecting remote kernels. This leads to a high-confidence multikernel approach, where failures of system subcomponents do not render the entire system in...

متن کامل

Evaluating Multipath TCP Resilience against Link Failures

Standard TCP is the de facto reliable transfer protocol for the Internet. It is designed to establish a reliable connection using only a single network interface. However, standard TCP with single interfacing performs poorly due to intermittent node connectivity. This requires the re-establishment of connections as the IP addresses change. Multi-path TCP (MPTCP) has emerged to utilize multiple ...

متن کامل

Improving Device Driver Reliability through Decoupled Dynamic Binary Analyses

Device drivers are Operating Systems (OS) extensions that enable the use of I/O devices in computing systems. However, studies have identified drivers as an Achilles’ heel of system reliability, their high fault rate accounting for a significant portion of system failures. Consequently, significant effort has been directed towards improving system robustness by protecting system components (e.g...

متن کامل

A Case for Virtual Machine Based Fault Injection in a High-Performance Computing Environment

Large-scale computing platforms provide tremendous capabilities for scientific discovery. As applications and system software scale up to multipetaflops and beyond to exascale platforms, the occurrence of failure will be much more common. This has given rise to a push in fault-tolerance and resilience research for high-performance computing (HPC) systems. This includes work on log analysis to i...

متن کامل

SR-IOV Networking in Xen: Architecture, Design and Implementation

SR-IOV capable network devices offer the benefits of direct I/O throughput and reduced CPU utilization while greatly increasing the scalability and sharing capabilities of the device. SR-IOV allows the benefits of the paravirtualized driver’s throughput increase and additional CPU usage reductions in HVMs (Hardware Virtual Machines). SR-IOV uses direct I/O assignment of a network device to mult...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013